Skip to content

feat: add native Google/Gemini Embedding 2 support with Parts API#718

Open
ZaynJarvis wants to merge 17 commits intovolcengine:mainfrom
ZaynJarvis:feat/google-embedding-native-api
Open

feat: add native Google/Gemini Embedding 2 support with Parts API#718
ZaynJarvis wants to merge 17 commits intovolcengine:mainfrom
ZaynJarvis:feat/google-embedding-native-api

Conversation

@ZaynJarvis
Copy link
Collaborator

@ZaynJarvis ZaynJarvis commented Mar 17, 2026

Overview

This PR adds native Google Gemini Embedding 2 support using the official Google API instead of OpenAI-compatible format.

Status: 🔍 Code reviewed. Pending real world testing.

Key Changes

  • Native API Integration: Uses Google's native embedding API endpoint (/v1beta/models/gemini-embedding-2-preview:embedContent) with Parts format
  • Gemini Embedding 2 Only: Focused implementation supporting only gemini-embedding-2-preview (3072 dimensions with MRL support)
  • Task-Specific Embeddings: Supports RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, and CLUSTERING task types
  • Flexible Parameter Format: Supports both simple format (e.g., 'RETRIEVAL_QUERY') and key=value format (e.g., 'task_type=RETRIEVAL_QUERY,output_dimensionality=1024')
  • Matryoshka Reduction: Built-in support for dimension reduction using output_dimensionality parameter
  • Future Multimodal Ready: Uses Parts API structure that can be extended for multimodal content
  • Chunking Support: Automatic text chunking and averaging for oversized inputs
  • Updated Documentation: Added configuration examples and provider documentation

References

Testing Needed

  • Real world testing with Google API key
  • Verify task-specific embeddings work correctly
  • Test Matryoshka dimension reduction
  • Validate chunking for oversized inputs

- Replace OpenAI-compatible implementation with native Gemini API
- Support task-specific embeddings (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Add Matryoshka dimension reduction support
- Include chunking for oversized texts
- Add configuration examples and documentation
- Support both simple and key=value parameter formats
- Use Parts API for future multimodal capability
- Remove stray server.pid file
- Fix base URL to https://generativelanguage.googleapis.com/v1beta
- Use x-goog-api-key header instead of URL parameter
- Remove model field from request body (already in URL)
- Follow official Google API format exactly
- Remove support for text-embedding-004 and text-embedding-005
- Focus implementation on gemini-embedding-2-preview only
- Add model validation to ensure only supported model is used
- Update documentation to reflect single model support
- Clarify that this is specifically for Gemini Embedding 2
ZaynJarvis and others added 2 commits March 18, 2026 09:32
- Covers basic functionality, advanced features, error handling
- Includes 11 test scenarios with expected outcomes
- Provides configuration examples and debug commands
- Ready for real-world testing with provided API key
@ZaynJarvis ZaynJarvis marked this pull request as ready for review March 18, 2026 03:46
@ZaynJarvis
Copy link
Collaborator Author

@qin-ctx /review

Copy link
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Found 4 blocking bugs that prevent embed() from functioning at runtime, plus 4 non-blocking suggestions.

Blocking Issues

  1. _estimate_tokens() method is not defined anywhere in the class hierarchy — every embed() call will crash with AttributeError
  2. _chunk_text() method is not defined in the embedder class hierarchy — chunking logic will crash
  3. self.max_tokens vs self._max_tokens — attribute name mismatch causes AttributeError
  4. cfg.max_tokens — field does not exist on EmbeddingModelConfig, factory lambda will crash

Non-blocking

  1. Inconsistent camelCase/snake_case in API request body (taskType vs output_dimensionality)
  2. No retry mechanism for HTTP requests
  3. No automated unit tests
  4. Gemini model row added to Volcengine model table in docs

def _chunk_and_embed(self, text: str, is_query: bool = False) -> EmbedResult:
"""Chunk oversized text and average the embeddings.

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) _chunk_text() is not defined in the embedder class hierarchy.

_chunk_and_embed() calls self._chunk_text(text, self.max_tokens), but this method does not exist in GoogleDenseEmbedder or any of its parent classes. The only _chunk_text in the codebase is a @staticmethod on SessionCompressor (in openviking/session/compressor.py), which is unrelated to embedders.

Also, self.max_tokens should be self._max_tokens (same attribute name mismatch as above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked: _chunk_text() is defined in EmbedderBase (base.py:121) and inherited. Real bug found and fixed: the call was passing two args (self._chunk_text(text, self.max_tokens)) to a method that only accepts one. Removed the extra argument.

"api_key": cfg.api_key,
"api_base": cfg.api_base,
"dimension": cfg.dimension,
**({"query_param": cfg.query_param} if cfg.query_param else {}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) cfg.max_tokens does not exist on EmbeddingModelConfig.

EmbeddingModelConfig (pydantic model with extra="forbid") has no max_tokens field. Accessing cfg.max_tokens will raise AttributeError. The max_tokens field exists on VLMConfig but not on EmbeddingModelConfig.

Suggested fix: Either add a max_tokens field to EmbeddingModelConfig, or remove this line and handle the default inside GoogleDenseEmbedder.__init__ (which already defaults to 8192).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked: max_tokens field IS defined on EmbeddingModelConfig (embedding_config.py:54-57). No fix needed.

# Build request body using Parts API
request_body = {"content": {"parts": [{"text": text}]}}

# Add task-specific parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Design] (non-blocking) No retry mechanism for API requests.

This uses raw requests.post() without any retry logic. Other providers (Jina, Voyage, OpenAI) benefit from the OpenAI client's built-in retry mechanism. The base module already provides exponential_backoff_retry in openviking/models/embedder/base.py which could be used here to handle transient network failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: wrapped requests.post with exponential_backoff_retry from base module, retrying on ConnectionError and Timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix.

@@ -128,6 +128,7 @@ Embedding model configuration for vector search, supporting dense, sparse, and h
|-------|-----------|------------|-------|
| `doubao-embedding-vision-250615` | 1024 | multimodal | Recommended |
| `doubao-embedding-250615` | 1024 | text | Text only |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking) gemini-embedding-2-preview is added to the "Available Models" table which currently only contains Volcengine doubao-* models. This could be confusing since they are from different providers. Consider either adding a "Provider" column to the table, or listing Google models in a separate table under the Google provider section below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added a Provider column to the table so it is clear which provider each model belongs to.

- Fix _chunk_text called with extra arg (real bug: base method only accepts text)
- Fix inconsistent API key naming: output_dimensionality -> outputDimensionality
- Add exponential_backoff_retry for transient network failures
- Add Provider column to docs model table for clarity
@ZaynJarvis
Copy link
Collaborator Author

four bugs are all possibly caused by #741, will fix now.

- Add max_tokens property, _estimate_tokens, _chunk_text (+ helpers) to
  GoogleDenseEmbedder — these were removed from base class in main
- Restore max_tokens field on EmbeddingModelConfig for google factory
- Both snake_case (task_type) and camelCase (taskType) are accepted by the API
- All task type values produce identical embeddings in this model version
- Parameter is forwarded for forward compatibility with future model versions
gemini-embedding-2-preview silently ignores taskType — verified 2026-03-19
at full 3072 dims, all task types return bit-for-bit identical vectors.
Remove query_param, document_param, _parse_param_string, _build_request_params.
Add note in docstring. Update factory and tests accordingly.
…mbedder

- Rename embedding provider name from "google" to "gemini" throughout
  config, validation, factory registry, and docs
- Add max_tokens param/property to OpenAIDenseEmbedder (default 8000)
- Forward max_tokens from config to openai and ollama factory lambdas
  so user-configured chunking thresholds are not silently ignored
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants